A resource-dependent approach to word modeling for keyword spotting

نویسندگان

  • I-Fan Chen
  • Chin-Hui Lee
چکیده

A hierarchical framework is proposed to address the issues of modeling different type of words in keyword spotting (KWS). Keyword models are built at various levels according to the availability of training set resources for each individual word. The proposed approach improves the performance of KWS even when no training speech is available for the keywords. It also suggests an easier way to collect training data for these resource-limited words. Experimental results show that the proposed framework improves performance in KWS in a figure-of-merit (FOM) metric regardless of the number of training instances for each keyword. For words with abundant speech data, the proposed method exploits the training data better than the conventional modeling technique and boosts the system FOM from 9.79% to 42.78%. For words with a small amount of training data, the new method increases the system FOM from 29.05% to 49.06%. Even for keywords without any training examples, the new modeling scheme improves the system FOM from 60.96% to 66.51%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

بهبود کارایی سیستم کاوشگر کلمات تلفنی با استفاده از نرمالیزاسیون امتیاز اطمینان مبتنی بر روش برنامه‌ریزی خطی

Conventional word spotting systems determine hypothesized keywords and their confidence score using a speech recognizer. Acceptance or rejection of these keywords is intended based on comparison of their scores with a specific threshold. It has been proved that confidence score prepared by recognizer is highly dependent on sub-word structure of each keyword. So comparing assigned scores to keyw...

متن کامل

A probabilistic method for keyword retrieval in handwritten document images

Keyword retrieval in handwritten document images (word spotting) is very challenging given that OCR accuracy is not yet adequate for handwritten scripts, specially with large lexicons. Various proposed approaches build indices on information such as image features or OCR scores and have improved the performance of the traditional approach that builds index on OCR’ed text. In this paper, we impr...

متن کامل

A Survey on Various Word Spotting Techniques for Content Based Document Image Retrieval

Searching documents for information and retrieval of relevant documents is a basic activity. Various tools are readily available for searching and retrieval from digital documents, but not much robust methods are available for retrieval from historic documents and old manuscripts as they are not digitized but available in scanned formats. Conventional way of retrieval from scanned document imag...

متن کامل

A Study on Out-of-vocabulary Word Modeling for a Segment-based Keyword Spotting System

The purpose of a word spotting system is to detect a certain set of keywords in continuous speech. The most common approach consists of models of the keywords augmented with \ ller," or \garbage" models, that are trained to account for non-keyword speech and background noise. Another approach is to use a large vocabulary continuous speech recognition system (LVCSR) to produce the most likely hy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013